Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells433437
Missing cells (%)8.1%8.2%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric45
Categorical54
Text33

Alerts

Dataset ADataset B
Parch is highly imbalanced (54.4%) Alert not present in this datasetImbalance
Age has 80 (17.9%) missing values Age has 84 (18.8%) missing values Missing
Cabin has 352 (78.9%) missing values Cabin has 351 (78.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 308 (69.1%) zeros SibSp has 309 (69.3%) zeros Zeros
Fare has 10 (2.2%) zeros Fare has 6 (1.3%) zeros Zeros
Alert not present in this datasetParch is highly overall correlated with SibSpHigh correlation
Alert not present in this datasetSex is highly overall correlated with SurvivedHigh correlation
Alert not present in this datasetSibSp is highly overall correlated with ParchHigh correlation
Alert not present in this datasetSurvived is highly overall correlated with SexHigh correlation
Alert not present in this datasetParch has 340 (76.2%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2025-01-20 16:33:55.9276812025-01-20 16:33:57.739080
Analysis finished2025-01-20 16:33:57.7358512025-01-20 16:34:00.130132
Duration1.81 second2.39 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean441.37444444.64574
 Dataset ADataset B
Minimum31
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:00.244159image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum31
5-th percentile40.2552.75
Q1213.25210.25
median442444
Q3667672.5
95-th percentile854.5840.5
Maximum891890
Range888889
Interquartile range (IQR)453.75462.25

Descriptive statistics

 Dataset ADataset B
Standard deviation261.76081258.91901
Coefficient of variation (CV)0.593058370.58230403
Kurtosis-1.1978902-1.2611845
Mean441.37444444.64574
Median Absolute Deviation (MAD)227.5231.5
Skewness0.0326561710.00099488193
Sum196853198312
Variance68518.7267039.052
MonotonicityNot monotonicNot monotonic
2025-01-20T16:34:00.408236image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
62 1
 
0.2%
763 1
 
0.2%
185 1
 
0.2%
875 1
 
0.2%
184 1
 
0.2%
452 1
 
0.2%
35 1
 
0.2%
794 1
 
0.2%
190 1
 
0.2%
793 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
630 1
 
0.2%
374 1
 
0.2%
139 1
 
0.2%
397 1
 
0.2%
92 1
 
0.2%
605 1
 
0.2%
253 1
 
0.2%
95 1
 
0.2%
788 1
 
0.2%
283 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
13 1
0.2%
14 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
13 1
0.2%
14 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
279 
1
167 
0
283 
1
163 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row10
3rd row10
4th row10
5th row01

Common Values

ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Length

2025-01-20T16:34:00.521630image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:34:00.665836image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:00.701500image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Most occurring characters

ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
248 
1
109 
2
89 
3
248 
1
106 
2
92 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row31
2nd row33
3rd row23
4th row23
5th row31

Common Values

ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Length

2025-01-20T16:34:00.762733image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:34:00.815753image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:00.861162image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Most occurring characters

ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%
ValueCountFrequency (%)
3 248
55.6%
1 106
23.8%
2 92
 
20.6%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:01.225784image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6167
Median length4646
Mean length26.56053826.576233
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1184611853
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowBarah, Mr. Hanna AssiRinghini, Mr. Sante
2nd rowKink-Heilmann, Miss. Luise GretchenOsen, Mr. Olaf Elon
3rd rowAbelson, Mrs. Samuel (Hannah Wizosky)Olsson, Miss. Elina
4th rowBecker, Master. Richard FAndreasson, Mr. Paul Edvin
5th rowHagland, Mr. Ingvald Olai OlsenHomer, Mr. Harry ("Mr E Haven")
ValueCountFrequency (%)
mr 267
 
14.9%
miss 94
 
5.2%
mrs 55
 
3.1%
william 36
 
2.0%
john 21
 
1.2%
henry 20
 
1.1%
master 20
 
1.1%
james 17
 
0.9%
george 15
 
0.8%
charles 14
 
0.8%
Other values (881) 1233
68.8%
ValueCountFrequency (%)
mr 274
 
15.2%
miss 86
 
4.8%
mrs 58
 
3.2%
william 29
 
1.6%
john 22
 
1.2%
henry 20
 
1.1%
master 19
 
1.1%
george 17
 
0.9%
thomas 14
 
0.8%
charles 14
 
0.8%
Other values (891) 1249
69.3%
2025-01-20T16:34:01.763827image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1347
 
11.4%
r 962
 
8.1%
a 850
 
7.2%
e 814
 
6.9%
i 677
 
5.7%
n 657
 
5.5%
s 642
 
5.4%
M 567
 
4.8%
l 519
 
4.4%
o 496
 
4.2%
Other values (50) 4315
36.4%
ValueCountFrequency (%)
1357
 
11.4%
r 980
 
8.3%
e 814
 
6.9%
a 805
 
6.8%
s 657
 
5.5%
n 647
 
5.5%
i 634
 
5.3%
M 551
 
4.6%
l 527
 
4.4%
o 506
 
4.3%
Other values (49) 4375
36.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11846
100.0%
ValueCountFrequency (%)
(unknown) 11853
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1347
 
11.4%
r 962
 
8.1%
a 850
 
7.2%
e 814
 
6.9%
i 677
 
5.7%
n 657
 
5.5%
s 642
 
5.4%
M 567
 
4.8%
l 519
 
4.4%
o 496
 
4.2%
Other values (50) 4315
36.4%
ValueCountFrequency (%)
1357
 
11.4%
r 980
 
8.3%
e 814
 
6.9%
a 805
 
6.8%
s 657
 
5.5%
n 647
 
5.5%
i 634
 
5.3%
M 551
 
4.6%
l 527
 
4.4%
o 506
 
4.3%
Other values (49) 4375
36.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11846
100.0%
ValueCountFrequency (%)
(unknown) 11853
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1347
 
11.4%
r 962
 
8.1%
a 850
 
7.2%
e 814
 
6.9%
i 677
 
5.7%
n 657
 
5.5%
s 642
 
5.4%
M 567
 
4.8%
l 519
 
4.4%
o 496
 
4.2%
Other values (50) 4315
36.4%
ValueCountFrequency (%)
1357
 
11.4%
r 980
 
8.3%
e 814
 
6.9%
a 805
 
6.8%
s 657
 
5.5%
n 647
 
5.5%
i 634
 
5.3%
M 551
 
4.6%
l 527
 
4.4%
o 506
 
4.3%
Other values (49) 4375
36.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11846
100.0%
ValueCountFrequency (%)
(unknown) 11853
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1347
 
11.4%
r 962
 
8.1%
a 850
 
7.2%
e 814
 
6.9%
i 677
 
5.7%
n 657
 
5.5%
s 642
 
5.4%
M 567
 
4.8%
l 519
 
4.4%
o 496
 
4.2%
Other values (50) 4315
36.4%
ValueCountFrequency (%)
1357
 
11.4%
r 980
 
8.3%
e 814
 
6.9%
a 805
 
6.8%
s 657
 
5.5%
n 647
 
5.5%
i 634
 
5.3%
M 551
 
4.6%
l 527
 
4.4%
o 506
 
4.3%
Other values (49) 4375
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
295 
female
151 
male
301 
female
145 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.677134.6502242
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20862074
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowfemalemale
3rd rowfemalefemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%
ValueCountFrequency (%)
male 301
67.5%
female 145
32.5%

Length

2025-01-20T16:34:01.861752image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:34:01.920521image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:01.956585image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%
ValueCountFrequency (%)
male 301
67.5%
female 145
32.5%

Most occurring characters

ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 591
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 145
 
7.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2086
100.0%
ValueCountFrequency (%)
(unknown) 2074
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 591
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 145
 
7.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2086
100.0%
ValueCountFrequency (%)
(unknown) 2074
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 591
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 145
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2086
100.0%
ValueCountFrequency (%)
(unknown) 2074
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 591
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 145
 
7.0%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7575
Distinct (%)20.5%20.7%
Missing8084
Missing (%)17.9%18.8%
Infinite00
Infinite (%)0.0%0.0%
Mean29.39322430.265663
 Dataset ADataset B
Minimum0.420.42
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:02.067027image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile34
Q12120.125
median2828
Q33839
95-th percentile5460
Maximum8080
Range79.5879.58
Interquartile range (IQR)1718.875

Descriptive statistics

 Dataset ADataset B
Standard deviation14.54941615.384181
Coefficient of variation (CV)0.494992180.50830478
Kurtosis0.30302290.23584033
Mean29.39322430.265663
Median Absolute Deviation (MAD)89
Skewness0.311093560.55417231
Sum10757.9210956.17
Variance211.6855236.67303
MonotonicityNot monotonicNot monotonic
2025-01-20T16:34:02.232577image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 18
 
4.0%
19 14
 
3.1%
26 13
 
2.9%
21 13
 
2.9%
35 12
 
2.7%
36 12
 
2.7%
25 12
 
2.7%
27 12
 
2.7%
28 12
 
2.7%
32 11
 
2.5%
Other values (65) 237
53.1%
(Missing) 80
 
17.9%
ValueCountFrequency (%)
21 19
 
4.3%
24 17
 
3.8%
16 14
 
3.1%
36 13
 
2.9%
19 13
 
2.9%
22 13
 
2.9%
30 12
 
2.7%
29 12
 
2.7%
20 11
 
2.5%
25 11
 
2.5%
Other values (65) 227
50.9%
(Missing) 84
 
18.8%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 3
0.7%
5 2
 
0.4%
6 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 3
0.7%
5 2
 
0.4%
6 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.515695070.5
 Dataset ADataset B
Minimum00
Maximum88
Zeros308309
Zeros (%)69.1%69.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:02.340356image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile2.752
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.10909431.0721186
Coefficient of variation (CV)2.15067852.1442371
Kurtosis19.14427518.487441
Mean0.515695070.5
Median Absolute Deviation (MAD)00
Skewness3.81209823.7477367
Sum230223
Variance1.23009021.1494382
MonotonicityNot monotonicNot monotonic
2025-01-20T16:34:02.416403image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 15
 
3.4%
3 9
 
2.0%
4 9
 
2.0%
8 4
 
0.9%
5 1
 
0.2%
ValueCountFrequency (%)
0 309
69.3%
1 102
 
22.9%
2 13
 
2.9%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 15
 
3.4%
3 9
 
2.0%
4 9
 
2.0%
5 1
 
0.2%
8 4
 
0.9%
ValueCountFrequency (%)
0 309
69.3%
1 102
 
22.9%
2 13
 
2.9%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 309
69.3%
1 102
 
22.9%
2 13
 
2.9%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 15
 
3.4%
3 9
 
2.0%
4 9
 
2.0%
5 1
 
0.2%
8 4
 
0.9%

Parch
Categorical

 Dataset ADataset B
Distinct57
Distinct (%)1.1%1.6%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
343 
1
59 
2
40 
5
 
3
3
 
1
0
340 
1
54 
2
44 
3
 
4
5
 
2
Other values (2)
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters446
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique12 ?
Unique (%)0.2%0.4%

Sample

1st row0
2nd row2
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 54
 
12.1%
2 44
 
9.9%
3 4
 
0.9%
5 2
 
0.4%
4 1
 
0.2%
6 1
 
0.2%

Length

2025-01-20T16:34:02.496150image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:34:02.553947image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:02.620917image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 343
76.9%
1 59
 
13.2%
2 40
 
9.0%
5 3
 
0.7%
3 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct385383
Distinct (%)86.3%85.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:03.068442image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.74439466.7376682
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30083005
Distinct characters3135
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique340331 ?
Unique (%)76.2%74.2%

Sample

 Dataset ADataset B
1st row2663PC 17760
2nd row3151537534
3rd rowP/PP 3381350407
4th row230136347466
5th row65303111426
ValueCountFrequency (%)
pc 34
 
6.0%
a/5 11
 
2.0%
c.a 9
 
1.6%
3101295 5
 
0.9%
ca 5
 
0.9%
soton/o.q 5
 
0.9%
soton/oq 5
 
0.9%
ston/o 4
 
0.7%
2 4
 
0.7%
sc/paris 4
 
0.7%
Other values (408) 477
84.7%
ValueCountFrequency (%)
pc 29
 
5.1%
c.a 11
 
1.9%
a/5 11
 
1.9%
ca 9
 
1.6%
2 8
 
1.4%
ston/o 8
 
1.4%
w./c 7
 
1.2%
a/4 5
 
0.9%
2144 5
 
0.9%
soton/oq 4
 
0.7%
Other values (401) 474
83.0%
2025-01-20T16:34:03.630006image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 380
12.6%
1 344
11.4%
2 296
9.8%
7 232
 
7.7%
4 225
 
7.5%
6 213
 
7.1%
5 208
 
6.9%
0 198
 
6.6%
9 176
 
5.9%
8 135
 
4.5%
Other values (21) 601
20.0%
ValueCountFrequency (%)
3 373
12.4%
1 341
11.3%
2 315
10.5%
7 242
8.1%
4 239
 
8.0%
0 212
 
7.1%
6 198
 
6.6%
5 195
 
6.5%
9 151
 
5.0%
8 138
 
4.6%
Other values (25) 601
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3005
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 380
12.6%
1 344
11.4%
2 296
9.8%
7 232
 
7.7%
4 225
 
7.5%
6 213
 
7.1%
5 208
 
6.9%
0 198
 
6.6%
9 176
 
5.9%
8 135
 
4.5%
Other values (21) 601
20.0%
ValueCountFrequency (%)
3 373
12.4%
1 341
11.3%
2 315
10.5%
7 242
8.1%
4 239
 
8.0%
0 212
 
7.1%
6 198
 
6.6%
5 195
 
6.5%
9 151
 
5.0%
8 138
 
4.6%
Other values (25) 601
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3005
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 380
12.6%
1 344
11.4%
2 296
9.8%
7 232
 
7.7%
4 225
 
7.5%
6 213
 
7.1%
5 208
 
6.9%
0 198
 
6.6%
9 176
 
5.9%
8 135
 
4.5%
Other values (21) 601
20.0%
ValueCountFrequency (%)
3 373
12.4%
1 341
11.3%
2 315
10.5%
7 242
8.1%
4 239
 
8.0%
0 212
 
7.1%
6 198
 
6.6%
5 195
 
6.5%
9 151
 
5.0%
8 138
 
4.6%
Other values (25) 601
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3008
100.0%
ValueCountFrequency (%)
(unknown) 3005
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 380
12.6%
1 344
11.4%
2 296
9.8%
7 232
 
7.7%
4 225
 
7.5%
6 213
 
7.1%
5 208
 
6.9%
0 198
 
6.6%
9 176
 
5.9%
8 135
 
4.5%
Other values (21) 601
20.0%
ValueCountFrequency (%)
3 373
12.4%
1 341
11.3%
2 315
10.5%
7 242
8.1%
4 239
 
8.0%
0 212
 
7.1%
6 198
 
6.6%
5 195
 
6.5%
9 151
 
5.0%
8 138
 
4.6%
Other values (25) 601
20.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176186
Distinct (%)39.5%41.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.96895532.567806
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros106
Zeros (%)2.2%1.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:03.867189image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1625257.225
Q17.89587.8958
median14.454213.64585
Q330.392730.5
95-th percentile120118.31875
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.496922.6042

Descriptive statistics

 Dataset ADataset B
Standard deviation57.8863453.178015
Coefficient of variation (CV)1.70409541.63284
Kurtosis32.79592333.039693
Mean33.96895532.567806
Median Absolute Deviation (MAD)6.76676.39585
Skewness4.97494644.8821254
Sum15150.15414525.241
Variance3350.82842827.9013
MonotonicityNot monotonicNot monotonic
2025-01-20T16:34:04.031207image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 26
 
5.8%
7.8958 22
 
4.9%
13 19
 
4.3%
26 15
 
3.4%
7.75 14
 
3.1%
10.5 14
 
3.1%
0 10
 
2.2%
26.55 10
 
2.2%
8.6625 9
 
2.0%
7.25 9
 
2.0%
Other values (166) 298
66.8%
ValueCountFrequency (%)
8.05 21
 
4.7%
7.8958 20
 
4.5%
13 19
 
4.3%
26 18
 
4.0%
7.75 13
 
2.9%
10.5 12
 
2.7%
7.775 11
 
2.5%
7.925 10
 
2.2%
7.8542 8
 
1.8%
26.55 8
 
1.8%
Other values (176) 306
68.6%
ValueCountFrequency (%)
0 10
2.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.975 1
 
0.2%
7.05 4
 
0.9%
7.0542 1
 
0.2%
7.125 4
 
0.9%
7.1417 1
 
0.2%
7.225 4
 
0.9%
7.2292 8
1.8%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 10
2.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.975 1
 
0.2%
7.05 4
 
0.9%
7.0542 1
 
0.2%
7.125 4
 
0.9%
7.1417 1
 
0.2%
7.225 4
 
0.9%
7.2292 8
1.8%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8282
Distinct (%)87.2%86.3%
Missing352351
Missing (%)78.9%78.7%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:34:04.415745image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.76595743.6736842
Min length21

Characters and Unicode

 Dataset ADataset B
Total characters354349
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7271 ?
Unique (%)76.6%74.7%

Sample

 Dataset ADataset B
1st rowF4C87
2nd rowC23 C25 C27C23 C25 C27
3rd rowB38E44
4th rowC70D37
5th rowA5B94
ValueCountFrequency (%)
c23 3
 
2.7%
c25 3
 
2.7%
c27 3
 
2.7%
b96 3
 
2.7%
b98 3
 
2.7%
f 3
 
2.7%
b58 2
 
1.8%
c124 2
 
1.8%
d20 2
 
1.8%
d33 2
 
1.8%
Other values (81) 87
77.0%
ValueCountFrequency (%)
g6 3
 
2.6%
e101 3
 
2.6%
b96 2
 
1.8%
b98 2
 
1.8%
e25 2
 
1.8%
f33 2
 
1.8%
b28 2
 
1.8%
e67 2
 
1.8%
d26 2
 
1.8%
c23 2
 
1.8%
Other values (83) 92
80.7%
2025-01-20T16:34:04.873715image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 39
11.0%
C 35
 
9.9%
B 31
 
8.8%
3 29
 
8.2%
1 29
 
8.2%
8 23
 
6.5%
5 21
 
5.9%
19
 
5.4%
0 19
 
5.4%
6 17
 
4.8%
Other values (8) 92
26.0%
ValueCountFrequency (%)
B 34
 
9.7%
6 31
 
8.9%
C 30
 
8.6%
2 30
 
8.6%
1 27
 
7.7%
5 26
 
7.4%
3 25
 
7.2%
7 19
 
5.4%
19
 
5.4%
D 17
 
4.9%
Other values (8) 91
26.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 354
100.0%
ValueCountFrequency (%)
(unknown) 349
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 39
11.0%
C 35
 
9.9%
B 31
 
8.8%
3 29
 
8.2%
1 29
 
8.2%
8 23
 
6.5%
5 21
 
5.9%
19
 
5.4%
0 19
 
5.4%
6 17
 
4.8%
Other values (8) 92
26.0%
ValueCountFrequency (%)
B 34
 
9.7%
6 31
 
8.9%
C 30
 
8.6%
2 30
 
8.6%
1 27
 
7.7%
5 26
 
7.4%
3 25
 
7.2%
7 19
 
5.4%
19
 
5.4%
D 17
 
4.9%
Other values (8) 91
26.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 354
100.0%
ValueCountFrequency (%)
(unknown) 349
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 39
11.0%
C 35
 
9.9%
B 31
 
8.8%
3 29
 
8.2%
1 29
 
8.2%
8 23
 
6.5%
5 21
 
5.9%
19
 
5.4%
0 19
 
5.4%
6 17
 
4.8%
Other values (8) 92
26.0%
ValueCountFrequency (%)
B 34
 
9.7%
6 31
 
8.9%
C 30
 
8.6%
2 30
 
8.6%
1 27
 
7.7%
5 26
 
7.4%
3 25
 
7.2%
7 19
 
5.4%
19
 
5.4%
D 17
 
4.9%
Other values (8) 91
26.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 354
100.0%
ValueCountFrequency (%)
(unknown) 349
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 39
11.0%
C 35
 
9.9%
B 31
 
8.8%
3 29
 
8.2%
1 29
 
8.2%
8 23
 
6.5%
5 21
 
5.9%
19
 
5.4%
0 19
 
5.4%
6 17
 
4.8%
Other values (8) 92
26.0%
ValueCountFrequency (%)
B 34
 
9.7%
6 31
 
8.9%
C 30
 
8.6%
2 30
 
8.6%
1 27
 
7.7%
5 26
 
7.4%
3 25
 
7.2%
7 19
 
5.4%
19
 
5.4%
D 17
 
4.9%
Other values (8) 91
26.1%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing12
Missing (%)0.2%0.4%
Memory size7.0 KiB7.0 KiB
S
316 
C
94 
Q
35 
S
327 
C
82 
Q
35 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowCC
2nd rowSS
3rd rowCS
4th rowSS
5th rowSC

Common Values

ValueCountFrequency (%)
S 316
70.9%
C 94
 
21.1%
Q 35
 
7.8%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 327
73.3%
C 82
 
18.4%
Q 35
 
7.8%
(Missing) 2
 
0.4%

Length

2025-01-20T16:34:04.959085image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:34:05.012622image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:05.054820image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 316
71.0%
c 94
 
21.1%
q 35
 
7.9%
ValueCountFrequency (%)
s 327
73.6%
c 82
 
18.5%
q 35
 
7.9%

Most occurring characters

ValueCountFrequency (%)
S 316
71.0%
C 94
 
21.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 327
73.6%
C 82
 
18.5%
Q 35
 
7.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 316
71.0%
C 94
 
21.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 327
73.6%
C 82
 
18.5%
Q 35
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 316
71.0%
C 94
 
21.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 327
73.6%
C 82
 
18.5%
Q 35
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 316
71.0%
C 94
 
21.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 327
73.6%
C 82
 
18.5%
Q 35
 
7.9%

Interactions

Dataset A

2025-01-20T16:33:57.092743image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:59.513656image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.214371image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:57.989016image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.493400image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.328876image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.790110image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.695049image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.067587image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:57.158239image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:59.579126image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.281137image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.052273image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.565919image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.399025image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.863548image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.767912image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.134457image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:57.308729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:59.651744image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.354348image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.122777image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.643639image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.476688image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.936672image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.839957image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.300914image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:57.383575image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:59.793519image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.426144image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.262425image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:56.717625image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.622017image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:33:57.017780image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:33:58.992724image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.444422image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:58.548792image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:58.194970image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.724324image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:59.375192image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A


Interaction plot not present for dataset

Dataset B

2025-01-20T16:33:58.919224image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-01-20T16:34:05.108589image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:34:05.220659image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0580.0920.2680.0040.2600.097-0.2440.184
Embarked0.0581.0000.2120.0940.0000.2750.1600.0650.146
Fare0.0920.2121.0000.140-0.0340.4680.1600.4450.298
Parch0.2680.0940.1401.0000.0000.0490.3020.2700.224
PassengerId0.0040.000-0.0340.0001.0000.0070.080-0.0930.125
Pclass0.2600.2750.4680.0490.0071.0000.0540.1350.334
Sex0.0970.1600.1600.3020.0800.0541.0000.2650.477
SibSp-0.2440.0650.4450.270-0.0930.1350.2651.0000.231
Survived0.1840.1460.2980.2240.1250.3340.4770.2311.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1420.157-0.1850.0250.2910.089-0.2160.094
Embarked0.1421.0000.2020.0000.0000.2570.0920.0000.175
Fare0.1570.2021.0000.4320.0480.4540.2000.4150.234
Parch-0.1850.0000.4321.0000.0070.0520.2400.5010.047
PassengerId0.0250.0000.0480.0071.0000.0000.0700.0200.075
Pclass0.2910.2570.4540.0520.0001.0000.1380.1020.325
Sex0.0890.0920.2000.2400.0700.1381.0000.1960.550
SibSp-0.2160.0000.4150.5010.0200.1020.1961.0000.128
Survived0.0940.1750.2340.0470.0750.3250.5500.1281.000

Missing values

Dataset A

2025-01-20T16:33:57.495071image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-01-20T16:33:59.904047image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-01-20T16:33:57.592213image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-01-20T16:33:59.993927image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-01-20T16:33:57.689656image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-01-20T16:34:00.084990image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
18418513Kink-Heilmann, Miss. Luise Gretchenfemale4.00231515322.0250NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
18318412Becker, Master. Richard Fmale1.02123013639.0000F4S
45145203Hagland, Mr. Ingvald Olai OlsenmaleNaN106530319.9667NaNS
343501Meyer, Mr. Edgar Josephmale28.010PC 1760482.1708NaNC
79379401Hoyt, Mr. William FishermaleNaN00PC 1760030.6958NaNC
18919003Turcin, Mr. Stjepanmale36.0003492477.8958NaNS
79279303Sage, Miss. Stella AnnafemaleNaN82CA. 234369.5500NaNS
47447503Strandberg, Miss. Ida Sofiafemale22.00075539.8375NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
37337401Ringhini, Mr. Santemale22.000PC 17760135.6333NaNC
13813903Osen, Mr. Olaf Elonmale16.00075349.2167NaNS
39639703Olsson, Miss. Elinafemale31.0003504077.8542NaNS
919203Andreasson, Mr. Paul Edvinmale20.0003474667.8542NaNS
60460511Homer, Mr. Harry ("Mr E Haven")male35.00011142626.5500NaNC
25225301Stead, Mr. William Thomasmale62.00011351426.5500C87S
949503Coxon, Mr. Danielmale59.0003645007.2500NaNS
78778803Rice, Master. George Hughmale8.04138265229.1250NaNQ
28228303de Pelsmaeker, Mr. Alfonsmale16.0003457789.5000NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
76076103Garfirth, Mr. JohnmaleNaN0035858514.5000NaNS
69669703Kelly, Mr. Jamesmale44.0003635928.0500NaNS
55155202Sharp, Mr. Percival James Rmale27.00024435826.0000NaNS
82282301Reuchlin, Jonkheer. John Georgemale38.000199720.0000NaNS
57857903Caram, Mrs. Joseph (Maria Elias)femaleNaN10268914.4583NaNC
48548603Lefebre, Miss. JeanniefemaleNaN31413325.4667NaNS
17918003Leonard, Mr. Lionelmale36.000LINE0.0000NaNS
54454501Douglas, Mr. Walter Donaldmale50.010PC 17761106.4250C86C
64564611Harper, Mr. Henry Sleepermale48.010PC 1757276.7292D33C
616211Icard, Miss. Ameliefemale38.00011357280.0000B28NaN

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
64364413Foo, Mr. ChoongmaleNaN00160156.4958NaNS
41942003Van Impe, Miss. Catharinafemale10.00234577324.1500NaNS
47647702Renouf, Mr. Peter Henrymale34.0103102721.0000NaNS
27527611Andrews, Miss. Kornelia Theodosiafemale63.0101350277.9583D7S
37537611Meyer, Mrs. Edgar Joseph (Leila Saks)femaleNaN10PC 1760482.1708NaNC
48548603Lefebre, Miss. JeanniefemaleNaN31413325.4667NaNS
13613711Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S
45445503Peduzzi, Mr. JosephmaleNaN00A/5 28178.0500NaNS
82983011Stone, Mrs. George Nelson (Martha Evelyn)female62.00011357280.0000B28NaN
62963003O'Connell, Mr. Patrick DmaleNaN003349127.7333NaNQ

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.